feat(hesai): add CUDA-accelerated point cloud decoder by k1832 · Pull Request #421 · tier4/nebula

k1832 · 2026-03-19T06:07:41Z

PR Type

New Feature

Description

Add a GPU-accelerated decode path for Hesai LiDAR sensors using CUDA. The feature is:

Compile-time opt-in: Build with -DBUILD_CUDA=ON. When CUDA toolkit is not found, the build silently falls back to CPU-only.
Runtime opt-in: Set NEBULA_USE_CUDA=1 environment variable. When unset, the existing CPU path is used with zero overhead.

What it does

Processes an entire scan in a single batched CUDA kernel launch (launch_decode_hesai_scan_batch)
Uses pre-computed angle lookup tables (azimuth/elevation) uploaded to GPU once at initialization
Supports calibration-based and correction-based angle correctors
Currently validated on OT128 (Pandar128E4X) sensor

Files changed

File	Change
`hesai_cuda_kernels.cu`	New CUDA kernel for batched point cloud decoding
`hesai_cuda_decoder.hpp`	GPU buffer management, angle LUT, device memory
`hesai_decoder.hpp`	Integration: GPU scan buffer, flush, result conversion
`hesai_sensor.hpp`	Expose `max_scan_buffer_points()` for GPU buffer sizing
`angle_corrector_*.hpp`	Expose angle LUT data for GPU upload
`nebula_hesai_decoders/CMakeLists.txt`	CUDA library target, toolkit detection
`nebula_hesai/CMakeLists.txt`	CUDA decoder test target
`hesai_cuda_decoder_test.cpp`	5 GPU-vs-CPU equivalence tests

Known limitations

GPU kernel does not set return_type field (always 0)
Scan boundary detection differs from CPU's ScanCutter, causing up to ~1850 points to shift between adjacent scans (out of ~72k per scan)

Review Procedure

Build (with CUDA)

colcon build --packages-up-to nebula_hesai \
  --cmake-args -DBUILD_CUDA=ON -DBUILD_TESTING=ON

Requires NVIDIA CUDA Toolkit (tested with CUDA 12.x). If the toolkit is not found, the build succeeds but CUDA support is silently disabled.

Running with CUDA enabled

The GPU decode path is gated by a runtime environment variable:

# Enable GPU decoding
export NEBULA_USE_CUDA=1

# Launch the driver node as usual — it will log "GPU scan batching enabled" on startup
ros2 launch nebula_hesai ...

# To disable (default), unset the variable
unset NEBULA_USE_CUDA

Test

# Run all tests (132 existing + 5 new CUDA tests)
source install/setup.bash
colcon test --packages-select nebula_hesai --ctest-args -V

# Or run CUDA tests only
./build/nebula_hesai/hesai_cuda_decoder_test_main

Test results

[==========] Running 5 tests from 1 test suite.
[ RUN      ] HesaiCudaDecoderTest.OT128_GpuVsCpuEquivalence
[       OK ] HesaiCudaDecoderTest.OT128_GpuVsCpuEquivalence (21778 ms)
[ RUN      ] HesaiCudaDecoderTest.OT128_GpuOutputNonEmpty
[       OK ] HesaiCudaDecoderTest.OT128_GpuOutputNonEmpty (388 ms)
[ RUN      ] HesaiCudaDecoderTest.OT128_GpuFieldValidity
[       OK ] HesaiCudaDecoderTest.OT128_GpuFieldValidity (378 ms)
[ RUN      ] HesaiCudaDecoderTest.OT128_BoundaryScanPointCounts
[       OK ] HesaiCudaDecoderTest.OT128_BoundaryScanPointCounts (369 ms)
[ RUN      ] HesaiCudaDecoderTest.OT128_IntensityExactMatch
[       OK ] HesaiCudaDecoderTest.OT128_IntensityExactMatch (17217 ms)
[  PASSED  ] 5 tests.

# Full suite
Summary: 137 tests, 0 errors, 0 failures, 0 skipped

Remarks

When CUDA is not compiled in (BUILD_CUDA=OFF), the 5 CUDA tests are compiled but skip at runtime via GTEST_SKIP(), so they do not break CPU-only CI.
Tolerances in the equivalence tests were derived from a single OT128 rosbag. See test file header for observed values.

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

Assign PR to reviewer

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

Commits are properly organized and messages are according to the guideline
(Optional) Unit tests have been written for new behavior
PR title describes the changes

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

All open points are addressed and tracked via issues or tickets

CI Checks

Build and test for PR: Required to pass before the merge.

Add a GPU decode path for Hesai LiDAR sensors, gated behind compile-time BUILD_CUDA=ON and runtime NEBULA_USE_CUDA=1 environment variable. The implementation includes: - CUDA kernel for batched point cloud decoding (hesai_cuda_kernels.cu) - Angle LUT upload and GPU scan buffer management in hesai_decoder.hpp - GPU-vs-CPU equivalence tests for OT128 (Pandar128E4X) sensor The GPU path processes an entire scan in a single kernel launch, using pre-computed angle lookup tables and a sparse output buffer. When CUDA is not available or NEBULA_USE_CUDA is unset, the existing CPU path is used with zero overhead. Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

- Copyright year 2024 -> 2026 for new files - Replace deprecated find_package(CUDA) with find_package(CUDAToolkit) - Remove --expt-relaxed-constexpr flag (not needed) - Remove unused per-packet kernel and launcher (dead code) - Batch launcher returns bool; caller logs via NEBULA_LOG_STREAM - Reorder CudaNebulaPoint fields for better memory packing - Remove redundant is_multi_frame member; use n_frames > 1 - Make HesaiCudaDecoder destructor virtual - Add int32_t range guarantee comment in angle corrector Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

codecov · 2026-03-23T03:31:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.36%. Comparing base (baf4f92) to head (18fb65c).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
+ Coverage   48.34%   48.36%   +0.02%     
==========================================
  Files         156      157       +1     
  Lines       12996    13004       +8     
  Branches     6900     6903       +3     
==========================================
+ Hits         6283     6290       +7     
- Misses       5326     5327       +1     
  Partials     1387     1387

Flag	Coverage Δ
nebula_hesai	`32.69% <100.00%> (?)`
nebula_hesai_decoders	`32.69% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replace .points access with direct iteration over PointCloud<T> (which now extends std::vector<T> instead of pcl::PointCloud). Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

- Add missing #include <string> in hesai_decoder.hpp - Add missing #include <limits> in hesai_cuda_decoder_test.cpp - Fix readability/braces warning for ifdef-guarded else block Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

k1832 force-pushed the feat/core-cuda-decode branch from 580316f to cd2b0e8 Compare March 23, 2026 01:32

k1832 added 2 commits March 23, 2026 12:21

k1832 force-pushed the feat/core-cuda-decode branch from cd2b0e8 to 508175b Compare March 23, 2026 03:21

fix(hesai): adapt CUDA tests for new PointCloud API (PCL removal)

09658fe

Replace .points access with direct iteration over PointCloud<T> (which now extends std::vector<T> instead of pcl::PointCloud). Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

k1832 force-pushed the feat/core-cuda-decode branch from 62ab94c to 09658fe Compare March 23, 2026 03:55

pre-commit-ci bot and others added 2 commits March 23, 2026 03:56

ci(pre-commit): autofix

08afe44

fix(hesai): resolve cpplint errors in CUDA decoder

18fb65c

- Add missing #include <string> in hesai_decoder.hpp - Add missing #include <limits> in hesai_cuda_decoder_test.cpp - Fix readability/braces warning for ifdef-guarded else block Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

k1832 marked this pull request as ready for review March 23, 2026 04:05

k1832 requested review from mojomex, nishikawa-masaki and veqcc March 23, 2026 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hesai): add CUDA-accelerated point cloud decoder#421

feat(hesai): add CUDA-accelerated point cloud decoder#421
k1832 wants to merge 5 commits intotier4:mainfrom
k1832:feat/core-cuda-decode

k1832 commented Mar 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

k1832 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Related Links

Description

What it does

Files changed

Known limitations

Review Procedure

Build (with CUDA)

Running with CUDA enabled

Test

Test results

Remarks

Pre-Review Checklist for the PR Author

Checklist for the PR Reviewer

Post-Review Checklist for the PR Author

CI Checks

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

k1832 commented Mar 19, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading